Test–retest reliability of the FitMáx©-questionnaire in a clinical and healthy population

Purpose The FitMáx© was developed as a questionnaire-based instrument to estimate Cardiorespiratory Fitness (CRF) expressed as oxygen uptake at peak exercise (VO2peak). Test–retest reliability is a clinometric measurement property, which defines stability over time if multiple measurements are performed (i.e. reliability). The present study aimed to assess the test–retest reliability of the FitMáx©-questionnaire in different patient groups. Patients and methods A total of 127 cardiac, pulmonary and oncology patients and healthy subjects aged 19–84 years who completed the questionnaire twice within an average of 18 days were included for analysis. Participants were in a stable clinical situation (no acute disease or participating in a training program). To determine the test–retest reliability, the Intraclass Correlation Coefficient (ICC) and Standard Error of the Measurement (SEM) was calculated between the first (T0) and second (T1) administration of the questionnaires. Results An excellent agreement was found between the FitMáx©-questionnaire scores at T0 and T1, with an ICC of 0.97 (SEM 1.91) in the total study population and an ICC ranging from 0.93 to 0.98 (SEM 1.52–2.27) in the individual patient groups. Conclusion The FitMáx©-questionnaire proves to be reliable and stable over time to estimate CRF of patients and healthy subjects. Trial registration NTR (Netherlands Trial Register), NL8846. Registered 25 August 2020, https://trialsearch.who.int/Trial2.aspx?TrialID=NL8846 Supplementary Information The online version contains supplementary material available at 10.1186/s41687-023-00682-9.


Introduction
Cardiorespiratory fitness (CRF) is an important variable that influences several health outcomes including quality of life [1,2].Cardiopulmonary exercise testing (CPET) is the gold standard to objectively measure CRF expressed as the oxygen uptake at peak exercise (VO 2peak ) and is clinically used to determine the underlying cause of limitations in exercise capacity [3][4][5].However, CPET is costly and labour-intensive, whereas Patient-Reported Outcome Measures (PROMs) are a simple, safe and costeffective alternative, especially in repeated testing such as rehabilitation programs [6,7].
Máxima Medical Centre (MMC) developed the FitMáx©-questionnaire (FitMáx), which consists of only three single-answer, multiple-choice questions [8].The FitMáx was developed to estimate cardiorespiratory fitness expressed in VO 2peak based on the self-reported maximum capacity of walking, stair climbing and cycling.The FitMáx scores are combined with subject's age, sex and Body Mass Index (BMI) to estimate VO 2peak .A previous validation study showed a strong correlation between VO 2peak estimated by the FitMáx (FitMáx-VO 2peak ) and VO 2peak measured with CPET (CPET-VO 2peak ), r = 0.94 (0.92-0.95), ICC = 0.93 (0.91-0.95), and Standard Error of the Estimate (SEE) of 4.14 ml/kg/min.Moreover, FitMáx performed superiorly over commonly used questionnaires such as the Veterans Specific Activity Questionnaire (VSAQ) and Duke Activity Status Index (DASI) [8][9][10].
The clinical usefulness and applicability of PROMs depend on several clinometric properties including validity, responsiveness and reliability [11,12].Reliability is defined as the extent to which test results of subjects (whose condition has not changed) are the same over time.To assess such test-retest reliability of an instrument, repeated measures are performed under the same conditions [11,13].In this way it is possible to quantify the proportion of total variance in repeated measurements that is due to true differences in PROMs.The measurement error describes the systematic and random error of subjects' results that are not caused by true changes in the construct to be measured [11].
The present short report aimed to assess the test-retest reliability of the FitMáx in four different groups (healthy subjects, pulmonary, oncology, and cardiac patients) and in the total study population.

Setting
Pulmonary, oncology, and cardiac patients were recruited prospectively in MMC, Veldhoven and Eindhoven, the Netherlands.Healthy subjects were included at Ancora Health in Eindhoven, the Netherlands.The authorized Medical Research Ethics Committee of the MMC has reviewed the study protocol and concluded that the rules laid down in the Medical Research Involving Human Subjects Act (also known by its Dutch abbreviation WMO), do not apply to this study (reference number N20.086).The study was registered as NL8846 in the Netherlands Trial Register.

Study population
Subjects were eligible for inclusion if they were aged ≥ 18 years, had a good command of the Dutch language, and if no change in CRF was expected within 31 days from enrollment date.During their visit to MMC or Ancora Health, cardiac and pulmonary patients and healthy subjects who were scheduled to perform CPET, either for medical reasons or as part of a health check, were asked to participate in a study about CRF questionnaires.The CPET protocol is extensively described in our validation study [8].Since oncology patients do not perform CPET as part of standard care, they were included from the outpatient clinic of the sports department without performing a CPET.Oncology patients were not eligible for inclusion when they were undergoing active disease-specific treatments, potentially affecting their CRF, within the study period.Similar to our validation study, subjects were asked to complete the FitMáx, VSAQ and DASI questionnaires.The questionnaires were administered in a paper format twice to the same subject.Subjects were excluded from analysis if the Fit-Máx was incomplete, or if the period between T 0 and T 1 was > 31 days.To minimize a possible 'subject expectancy effect' , it was explicitly not explained that this was a study to determine the test-retest reliability of these questionnaires.All participants received a second information letter and questionnaire (T 1 ) two weeks after T 0 .We did not explicitly question participants about experienced change in CRF.All participants gave written informed consent to the use of their anonymized CPET and questionnaire data.

Statistical analysis
We performed a sample size calculation with an expected ICC of 0.85, a minimum acceptable ICC of 0.60 and two measurements per individual, requiring a sample size of n = 26 per subject group to achieve a power of 80%.
Statistical analyses were performed using R, version 4.2.1 (R Foundation for Statistical Computing, Vienna, Austria) [14].Normality of data was tested using the Shapiro-Wilk test, and checked qualitatively by means of histograms and Q-Q plots.Descriptive statistics were provided for demographic characteristics and reported as mean ± standard deviation (SD) in case of normal distribution, and as median and interquartile range (IQR) otherwise.For categorical variables, we reported frequencies and corresponding percentages.
Pearson correlation coefficient (r) was used to evaluate the linear relationship between CPET-VO 2peak and Questionnaire-VO 2peak at T 0 [15].
To evaluate the test-retest reliability of the questionnaires, the Intraclass Correlation Coefficient (ICC) with 95% confidence interval (95%-CI) was determined (Two Way Mixed, Absolute Agreement, single measurement) [16].The Standard Error of the Measurement (SEM, see Additional file 1: equations) [17] is a measure related to ICC, but clinically easier to interpret (expressed in the same unit as of the measurement of interest (VO 2peak )).The ICC and SEM were calculated between T 0 and T 1 for all questionnaires in all patient groups together, and for each patient group separately.An ICC < 0.50 indicates poor test-retest reliability, 0.50-0.75indicates moderate test-retest reliability, 0.75-0.90indicates good testretest reliability, and > 0.90 indicates excellent test-retest reliability [16].The higher the ICC, the lower the SEM and vice versa, but there is no standard measure for the SEM as it depends on the standard deviation of the data.
In addition, Bland-Altman plots were used to present systematic errors with 95% limits of agreement (95%-LoA), by plotting the difference between Questionnaire-VO 2peak at T 0 and T 1 against the mean Questionnaire-VO 2peak from T 0 and T 1 [18].

Results
In this study, 213 subjects participated.A total of 73 subjects did not return the T 1 -questionnaire, resulting in a response rate of 66%.11 subjects returned it after > 31 days from T 0 and, although we did not explicitly question, two subjects reported on paper to have changed CRF due to a COVID-19 infection and were excluded as well.As such, a total of 127 participants (84 men and 43 women) were included for analysis.The time between completing the questionnaires and CPET ranged from 11 to 31 days.
Since the data collection of some patient groups was completed sooner, we continued the data collection until a group of at least n = 26 was reached for every included patient group (pulmonary, oncology, cardiac and healthy subjects).The total study population's age ranged from 19 to 84 years.Ancora Health included healthy subjects during the COVID-19 period, using viral filters (MicroGard II, Vyaire Medical GmbH) resulting in inaccurate data, as such we omitted VO 2peak data of this group [19].As mentioned before, oncology patients were included from the outpatient clinic and did not perform CPET as part of standard care.Therefore, we present the CPET data from the total group without the healthy subjects and oncology patients.In the so-obtained population, the median VO 2peak was 21.94 (16.89-31.29;IQR) ml/kg/min, which is 94.1 (85.7-134.5)% of the predicted reference value for healthy Dutch persons of the same age and sex [20].Anthropometrical data, CPET data and questionnaire data are presented in Tables 1 and 2. Data of VSAQ and DASI questionnaires can be found in Additional file 2: Table S1.

Test-retest reliability
The ICC's and corresponding 95%-CI for each patient group are displayed in Table 2.The ICC of the FitMáx-VO 2peak between T 0 and T 1 in the total population, was 0.97 (0.96-0.98).As a sensitivity analysis, we performed our ICC analysis in a two-way model examining potential systematic difference and found similar results, as expected.We found similar high ICC values in the VSAQ [0.94 (0.92-0.96)] and DASI [0.90 (0.85-0.93)] (more information in Additional file 2: Table S1).A Bland-Altman plot is provided in Fig. 1 (Additional file 3: Figure S1 for all questionnaires) showing the difference between the two values of FitMáx-VO 2peak at T 0 and T 1 against their mean.The mean difference was − 0.39 (95%-LoA − 5.68 to 4.84 ml/kg / min), 0.31 (95%-LoA − 8.75 to 9.37) and 0.20 (95%-LoA − 5.56 to 5.96) for FitMáx, VSAQ and DASI respectively.

Discussion
The use of PROMs to assess CRF seems a simple, safe and cost-effective alternative for objective measurement using CPET in clinical settings [7].The applicability of such PROMs collected via self-reported questionnaires depends upon several clinometric properties.An important aspect in the validation of a new questionnaire is the test-retest reliability.The FitMáx showed an excellent test-retest reliability between the VO 2peak estimated at T 0 and T 1 , with an ICC of 0.97 (0.96-0.98;IQR) in the total population.In the different patient groups the ICC ranged from 0.93 to 0.98 for FitMáx, 0.83-0.95for VSAQ and 0.84-0.95for DASI.The ICC (and thus SEM) support the precision and reliability of the FitMáx and VSAQ and DASI.
A study by Ravani et al. [21] assessed the test-retest reliability of the DASI.The study was performed in pre-dialysis patients and patients who received a kidney transplant, and obtained an ICC of 0.71 and 0.81, respectively.These ICC values were lower than the ICC value(s) we found in the current study.This difference

Table 1 Participant characteristics
Results are displayed as n (%) and as median (IQR) Missing information, number of subjects: FEV 1 , 1; FVC, 1 cm centimetres, COPD chronic obstructive pulmonary disease, CPET cardiopulmonary exercise testing, FEV 1 forced expiratory volume in 1s, FVC forced vital capacity, GOLD Global initiative for chronic obstructive lung disease, HR heartrate, kg kilograms, kg/m 2 kilograms per square meter, L litres, min minutes, ml millilitres, n number of subjects, RER respiratory exchange ratio, T0 baseline measurement, T1 second measurement, VO 2peak peak oxygen uptake, W watts * Oncology patients did not perform a CPET † Most subjects (unknown number) in the healthy population performed a CPET with a viral filter during the COVID-19 period, resulting in unreliable CPET/spirometry parameters.To prevent confusion, we chose to omit these variables ‡ Oncology patients excluded.Moreover, most subjects (unknown number) in the healthy population performed a CPET with a viral filter during the COVID-19 period, resulting in unreliable CPET parameters.Given this inaccuracy, the total group for this variable is only based on pulmonary and cardiac patients ^The prediction model for VO 2peak of the LowLands Fitness Registry was used as a reference value [20] Variable  may be caused by the 6-month window they used in their study, which could have resulted in true CRF changes and therefore lower reliability [21].

Strengths
The strength of the current study lies in the diverse study population.We initially included healthy subjects, oncology, pulmonary, and cardiac patients.Although oncology patients and healthy subjects did not perform (valid) CPET, a wide range of VO 2peak values was observed in the current study population.The VO 2peak ranged from (extremely) low to above average [21.94 (9.8-53.3)].The FitMáx proves to be widely applicable in a clinical population, with both low and high VO 2peak .Moreover, the ICC values of the FitMáx show little variance in the several subject groups.Therefore we can conclude that the ICC is independent of the CPET-VO 2peak and the different patient groups to estimate CRF.At last, we ensured minimized 'subject expectancy effect' as the participants were not told that this study aimed to determine the testretest reliability of the FitMáx, but that they could possibly be approached a second time for the purpose of this study.

Clinical applicability
The FitMáx is an inexpensive tool with low burden for subjects to assess CRF.Moreover, the questionnaire proves to be effective in various populations and provides information on daily life activities in several dimensions (intensity, frequency and duration).The current study shows that the FitMáx is reliable to assess CRF over time when no change in CRF has occurred.This makes FitMáx a useful tool to assess self-reported CRF among patients and healthy subjects in clinical settings.

Limitations
The study reached a response rate of only 66%.This might be explained by the assumption of patients that they already completed the exact same questionnaires before.The test-retest period used in the current study was on average 18 days, which could have been too short to prevent subjects from remembering the response of the FitMáx from memory.However, following recommendations, we have deliberately chosen for this short recall period in order to reduce reporting error in estimates of CRF due to fluctuating experienced physical fitness, especially in patients [2,22].The small sample size prohibited statistical testing to compare the ICC between questionnaires.Although inspection of the ICC in supplementary material might suggest a higher reproducibility for FitMáx in most patient groups, all three questionnaire revealed high ICC values.This possible difference may not be statistically or clinically relevant.

Conclusion
The FitMáx proves to be highly reliable in repeated measures to assess CRF of patients with different conditions and healthy subjects, when no change in CRF was expected.This increases the applicability and clinical usefulness of the FitMáx.
T 0 Baseline measurement T 1 Second measurement VO 2peak Oxygen uptake at peak exercise VSAQ Veterans Specific Activity Questionnaire W Watts 95%-CI 95% Confidence interval 95%-LoA 95% Limits of Agreement

Table 2
Intraclass correlations of the questionnaires between T 0 and T 1The ICC is presented as its output with 95%-CI DASI; Duke Acitivity Status Index, VSAQ; Veterans Specific Activity Questionnaire $ As known, the DASI has a ceiling effect, resulting in the maximal score in almost all healthy subjects preventing accurate examination of the (intraclass) correlation.As such, data of the DASI for healthy subjects are omitted for analysis